From Lasso regression to Feature vector machine

نویسندگان

  • Fan Li
  • Yiming Yang
  • Eric P. Xing
چکیده

Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with non-linear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spatio-temporal feature selection for black-box weather forecasting

In this paper, a data-driven modeling technique is proposed for temperature forecasting. Due to the high dimensionality, LASSO is used as feature selection approach. Considering spatio-temporal structure of the weather dataset, first LASSO is applied in a spatial and temporal scenario, independently. Next, a feature is included in the model if it is selected by both. Finally, Least Squares Supp...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

Classification with correlated features: unreliability of feature ranking and solutions

MOTIVATION Classification and feature selection of genomics or transcriptomics data is often hampered by the large number of features as compared with the small number of samples available. Moreover, features represented by probes that either have similar molecular functions (gene expression analysis) or genomic locations (DNA copy number analysis) are highly correlated. Classical model selecti...

متن کامل

Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems

We describe a fast method to eliminate features (variables) in l1-penalized least-square regression (or LASSO) problems. The elimination of features leads to a potentially substantial reduction in running time, especially for large values of the penalty parameter. Our method is not heuristic: it only eliminates features that are guaranteed to be absent after solving the LASSO problem. The featu...

متن کامل

Sparsity Regularization for classification of large dimensional data

Feature selection has evolved to be a very important step in several machine learning paradigms. Especially in the domains of bio-informatics and text classification which involve data of high dimensions, feature selection can help in drastically reducing the feature space. In cases where it is difficult or infeasible to obtain sufficient training examples, feature selection helps overcome the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005